'''
Final Project Tutorial
Joed Quaye
Ronald Chomnou
Mark Spooner
Griffin Araujo
'''
# necessary imports
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import requests
import numpy as np
from functools import reduce
import pandas as pd
import matplotlib.pyplot as plt
import re
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import seaborn as sns
import statsmodels.api as sm
# looking at the past 6 seasons for data analysis
URL2324 = "https://www.basketball-reference.com/leagues/NBA_2024_per_game.html"
URL2223 = "https://www.basketball-reference.com/leagues/NBA_2023_per_game.html"
URL2122 = "https://www.basketball-reference.com/leagues/NBA_2022_per_game.html"
URL2021 = "https://www.basketball-reference.com/leagues/NBA_2021_per_game.html"
URL1920 = "https://www.basketball-reference.com/leagues/NBA_2020_per_game.html"
URL1819 = "https://www.basketball-reference.com/leagues/NBA_2019_per_game.html"
URLMIP1920 = "https://www.basketball-reference.com/awards/awards_2020.html"
URLMIP2021 = "https://www.basketball-reference.com/awards/awards_2021.html"
URLMIP2122 = "https://www.basketball-reference.com/awards/awards_2022.html"
URLMIP2223 = "https://www.basketball-reference.com/awards/awards_2023.html"
URLMIP2324 = "https://www.basketball-reference.com/awards/awards_2024.html"
# URL1718 = "https://www.basketball-reference.com/leagues/NBA_2018_per_game.html"
Introduction
The primary goal of this project is to guide you through the comprehensive process of data analysis within the context of basketball performance metrics. Our focus will be on evaluating the performance data of basketball players over the past six years, specifically to identify trends and correlations that can help predict the Most Improved Player (MIP) in future seasons. The MIP award is given annually to the player who has shown the most significant improvement in their performance, making it an intriguing subject for data-driven analysis.
Why is this important? Recognizing and predicting the Most Improved Player can provide valuable insights into player development, scouting, and team strategy. Understanding the factors that contribute to a player's improvement can help teams invest in potential stars early, optimize training programs, and enhance overall team performance. Moreover, fans and analysts alike can gain a deeper appreciation for the game's dynamics and the players' growth trajectories.
Throughout this project, we will utilize a variety of data science techniques to analyze player statistics, performance metrics, and other relevant data points. By examining past MIP winners and comparing their performance data to other players, we aim to uncover patterns and predictive indicators. Our analysis will focus on several key aspects:
Data Collection: Gathering comprehensive player data from the past six years, including points per game, offensive and defensive rebounds, assists, blocks, and steals. Data Cleaning and Preparation: Ensuring the data is accurate, complete, and formatted for analysis. Exploratory Data Analysis (EDA): Visualizing and summarizing the data to identify trends, anomalies, and initial insights. Feature Engineering: Creating new variables and metrics that might be significant predictors of improvement. Modeling and Prediction: Applying various statistical and machine learning models to predict future MIP candidates based on historical data. Evaluation and Interpretation: Assessing the performance of our models and interpreting the results to draw meaningful conclusions. By following this structured approach, we aim to provide a robust analysis that not only identifies potential future MIP candidates but also enhances our understanding of the factors driving player improvement in professional basketball.
Data Collection
Our task is to gather the necessary datasets for our analysis. Our project aims to analyze basketball player performance data over the past six years and compare it to the Most Improved Player (MIP) of each year to identify trends and correlations that could help predict future MIP winners.
To achieve this, we need comprehensive performance data for each player from the last six seasons, as well as the MIP rankings for those seasons. We will use web scraping techniques to collect this data from reliable sports statistics websites. Python, along with libraries such as BeautifulSoup and Selenium, will be instrumental in this process.
Player Statistics Data Scraping
The following function takes a URL and returns a dictionary with the corresponding player data as output:
# function takes in a URL and returns a dictionary with corresponding data as output
def data_scrape(URL):
# new webdriver
driver = webdriver.Safari()
driver.get(URL)
# reading the data as HTML
html = BeautifulSoup(driver.page_source, 'html.parser')
table = html.find('table', {'id': 'per_game_stats'})
player_stats = {}
temp = {}
usedTOT = False
body = table.find('tbody')
rows = body.find_all('tr')
# some are duplicates
for row in rows:
try:
# cells obtains all the column data of each row
cells = row.find_all('td')
# for each row, get appropriate stat (according to column location) and append
# first instance of td is the names
player_name = cells[0].text.strip()
# taking into account whether the player played for multiple teams
if cells[3].text == 'TOT':
temp[player_name] = {}
temp[player_name]["position"] = cells[1].text.strip()
temp[player_name]["age"] = cells[2].text.strip()
temp[player_name]["games played"] = cells[4].text.strip()
temp[player_name]["games started"] = cells[5].text.strip()
temp[player_name]["minutes played per game"] = cells[6].text.strip()
temp[player_name]["field goals"] = cells[7].text.strip()
temp[player_name]["field goal attempts"] = cells[8].text.strip()
temp[player_name]["fg percentage"] = cells[9].text.strip()
temp[player_name]["3pt per game"] = cells[10].text.strip()
temp[player_name]["3pt attempts"] = cells[11].text.strip()
temp[player_name]["3pt percentage"] = cells[12].text.strip()
temp[player_name]["2pt per game"] = cells[13].text.strip()
temp[player_name]["2pt attempts"] = cells[14].text.strip()
temp[player_name]["2pt percentage"] = cells[15].text.strip()
temp[player_name]["effective fg percentage"] = cells[16].text.strip()
temp[player_name]["free throws"] = cells[17].text.strip()
temp[player_name]["free throw attempts"] = cells[18].text.strip()
temp[player_name]["free throw percentage"] = cells[19].text.strip()
temp[player_name]["offensive rebounds"] = cells[20].text.strip()
temp[player_name]["defensive rebounds"] = cells[21].text.strip()
temp[player_name]["total rebounds"] = cells[22].text.strip()
temp[player_name]["assists"] = cells[23].text.strip()
temp[player_name]["steals"] = cells[24].text.strip()
temp[player_name]["blocks"] = cells[25].text.strip()
temp[player_name]["turnovers"] = cells[26].text.strip()
temp[player_name]["personal fouls"] = cells[27].text.strip()
temp[player_name]["ppg"] = cells[28].text.strip()
usedTOT = True
continue
# taking into account whether the person played for multiple teams (only keeping first)
if player_name in player_stats:
continue
player_stats[player_name] = {}
player_stats[player_name]["position"] = temp[player_name]["position"] if usedTOT else cells[1].text.strip()
player_stats[player_name]["age"] = temp[player_name]["age"] if usedTOT else cells[2].text.strip()
player_stats[player_name]["team"] = cells[3].text.strip()
player_stats[player_name]["games played"] = temp[player_name]["games played"] if usedTOT else cells[4].text.strip()
player_stats[player_name]["games started"] = temp[player_name]["games started"] if usedTOT else cells[5].text.strip()
player_stats[player_name]["minutes played per game"] = temp[player_name]["minutes played per game"] if usedTOT else cells[6].text.strip()
player_stats[player_name]["field goals"] = temp[player_name]["field goals"] if usedTOT else cells[7].text.strip()
player_stats[player_name]["field goal attempts"] = temp[player_name]["field goal attempts"] if usedTOT else cells[8].text.strip()
player_stats[player_name]["fg percentage"] = temp[player_name]["fg percentage"] if usedTOT else cells[9].text.strip()
player_stats[player_name]["3pt per game"] = temp[player_name]["3pt per game"] if usedTOT else cells[10].text.strip()
player_stats[player_name]["3pt attempts"] = temp[player_name]["3pt attempts"] if usedTOT else cells[11].text.strip()
player_stats[player_name]["3pt percentage"] = temp[player_name]["3pt percentage"] if usedTOT else cells[12].text.strip()
player_stats[player_name]["2pt per game"] = temp[player_name]["2pt per game"] if usedTOT else cells[13].text.strip()
player_stats[player_name]["2pt attempts"] = temp[player_name]["2pt attempts"] if usedTOT else cells[14].text.strip()
player_stats[player_name]["2pt percentage"] = temp[player_name]["2pt percentage"] if usedTOT else cells[15].text.strip()
player_stats[player_name]["effective fg percentage"] = temp[player_name]["effective fg percentage"] if usedTOT else cells[16].text.strip()
player_stats[player_name]["free throws"] = temp[player_name]["free throws"] if usedTOT else cells[17].text.strip()
player_stats[player_name]["free throw attempts"] = temp[player_name]["free throw attempts"] if usedTOT else cells[18].text.strip()
player_stats[player_name]["free throw percentage"] = temp[player_name]["free throw percentage"] if usedTOT else cells[19].text.strip()
player_stats[player_name]["offensive rebounds"] = temp[player_name]["offensive rebounds"] if usedTOT else cells[20].text.strip()
player_stats[player_name]["defensive rebounds"] = temp[player_name]["defensive rebounds"] if usedTOT else cells[21].text.strip()
player_stats[player_name]["total rebounds"] = temp[player_name]["total rebounds"] if usedTOT else cells[22].text.strip()
player_stats[player_name]["assists"] = temp[player_name]["assists"] if usedTOT else cells[23].text.strip()
player_stats[player_name]["steals"] = temp[player_name]["steals"] if usedTOT else cells[24].text.strip()
player_stats[player_name]["blocks"] = temp[player_name]["blocks"] if usedTOT else cells[25].text.strip()
player_stats[player_name]["turnovers"] = temp[player_name]["turnovers"] if usedTOT else cells[26].text.strip()
player_stats[player_name]["personal fouls"] = temp[player_name]["personal fouls"] if usedTOT else cells[27].text.strip()
player_stats[player_name]["ppg"] = temp[player_name]["ppg"] if usedTOT else cells[28].text.strip()
usedTOT = False
except:
continue
driver.quit()
# returning the player stats
return player_stats
# obtaing all season data
first_season = data_scrape(URL2324)
second_season = data_scrape(URL2223)
third_season = data_scrape(URL2122)
fourth_season = data_scrape(URL2021)
fifth_season = data_scrape(URL1920)
sixth_season = data_scrape(URL1819)
Collecting MIP Rankings
The function below scrapes MIP rankings data from the given URL stored in URLMIP1920, URLMIP2021, URLMIP2122, URLMIP2223, URLMIP2324 corresponding to each year.
# Scrape data for all the MIP ranked tables for last 5 seasons
def mip_scrape(URL):
driver = webdriver.Safari()
driver.get(URL)
# Parse the HTML content of the page
html = BeautifulSoup(driver.page_source, 'html.parser')
# Find the table containing the most improved players
table = html.find('table', {'id': 'mip'})
mipMap = {}
body = table.find('tbody')
# Extract the table rows
rows = body.find_all('tr')
for row in rows:
rank = row.find('th').text.strip()
cells = row.find_all('td')
mipRank = cells[0].text.strip()
mipMap[mipRank] = {}
# Removes the T from the ranking that indicates you are tied in voting in the tables
mipMap[mipRank]["Rank"] = re.sub(r'\D','',rank)
driver.quit()
return mipMap
mipTable1920 = mip_scrape(URLMIP1920)
mipTable2021 = mip_scrape(URLMIP2021)
mipTable2122 = mip_scrape(URLMIP2122)
mipTable2223 = mip_scrape(URLMIP2223)
mipTable2324 = mip_scrape(URLMIP2324)
In this section, we utilize Pandas and NumPy to manipulate and organize our dataframes, which are structured as Pandas-based objects. If you're new to these libraries, you can explore their functionalities through the following documentation:
Pandas Documentation NumPy Documentation Our goal here is to clean and organize the player statistics and MIP rankings data collected from various seasons into a format that is ready for analysis.
Creating DataFrames from Dictionaries
First, we convert the scraped dictionaries into Pandas DataFrames. Each dictionary represents the data for a specific season, and we use pd.DataFrame.from_dict to perform the conversion. The orient='index' parameter ensures that the dictionary keys become the index of the DataFrame.
Display Settings
To ensure we can view the entire contents of the DataFrames, we adjust the display settings of Pandas to show all rows and columns. This helps in verifying the completeness and correctness of our data.
Displaying DataFrames
We define functions to display the head and tail of each DataFrame. This provides a quick overview of the data and helps in verifying that the data has been loaded correctly.
# creating dataframe based off created dictionary
data2324 = pd.DataFrame.from_dict(first_season, orient='index')
data2223 = pd.DataFrame.from_dict(second_season, orient='index')
data2122 = pd.DataFrame.from_dict(third_season, orient='index')
data2021 = pd.DataFrame.from_dict(fourth_season, orient='index')
data1920 = pd.DataFrame.from_dict(fifth_season, orient='index')
data1819 = pd.DataFrame.from_dict(sixth_season, orient='index')
mipdata2324 = pd.DataFrame.from_dict(mipTable2324, orient='index')
mipdata2223 = pd.DataFrame.from_dict(mipTable2223, orient='index')
mipdata2122 = pd.DataFrame.from_dict(mipTable2122, orient='index')
mipdata2021 = pd.DataFrame.from_dict(mipTable2021, orient='index')
mipdata1920 = pd.DataFrame.from_dict(mipTable1920, orient='index')
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
count = 24
mipcount = 24
# now printing dataframe data
def data_display(dataframe):
global count
print("\n" + "20" + str(count - 1) + "-" + str(count) + " SEASON")
print(dataframe.head())
print(dataframe.tail())
count -= 1
def mip_display(dataframe):
global mipcount
print("\n" + "20" + str(mipcount - 1) + "-" + str(mipcount) + " SEASON")
print(dataframe.head())
print(dataframe.tail())
mipcount -= 1
data_display(data2324)
data_display(data2223)
data_display(data2122)
data_display(data2021)
data_display(data1920)
data_display(data1819)
mip_display(mipdata2324)
mip_display(mipdata2223)
mip_display(mipdata2122)
mip_display(mipdata2021)
mip_display(mipdata1920)
2023-24 SEASON
position age team games played games started \
Precious Achiuwa PF-C 24 TOR 74 18
Bam Adebayo C 26 MIA 71 71
Ochai Agbaji SG 23 UTA 78 28
Santi Aldama PF 23 MEM 61 35
Nickeil Alexander-Walker SG 25 MIN 82 20
minutes played per game field goals \
Precious Achiuwa 21.9 3.2
Bam Adebayo 34.0 7.5
Ochai Agbaji 21.0 2.3
Santi Aldama 26.5 4.0
Nickeil Alexander-Walker 23.4 2.9
field goal attempts fg percentage 3pt per game \
Precious Achiuwa 6.3 .501 0.4
Bam Adebayo 14.3 .521 0.2
Ochai Agbaji 5.6 .411 0.8
Santi Aldama 9.3 .435 1.7
Nickeil Alexander-Walker 6.6 .439 1.6
3pt attempts 3pt percentage 2pt per game \
Precious Achiuwa 1.3 .268 2.8
Bam Adebayo 0.6 .357 7.3
Ochai Agbaji 2.7 .294 1.5
Santi Aldama 5.0 .349 2.3
Nickeil Alexander-Walker 4.1 .391 1.3
2pt attempts 2pt percentage effective fg percentage \
Precious Achiuwa 5.0 .562 .529
Bam Adebayo 13.7 .528 .529
Ochai Agbaji 2.8 .523 .483
Santi Aldama 4.3 .534 .528
Nickeil Alexander-Walker 2.5 .517 .560
free throws free throw attempts \
Precious Achiuwa 0.9 1.5
Bam Adebayo 4.1 5.5
Ochai Agbaji 0.5 0.7
Santi Aldama 0.9 1.4
Nickeil Alexander-Walker 0.6 0.8
free throw percentage offensive rebounds \
Precious Achiuwa .616 2.6
Bam Adebayo .755 2.2
Ochai Agbaji .661 0.9
Santi Aldama .621 1.2
Nickeil Alexander-Walker .800 0.4
defensive rebounds total rebounds assists steals \
Precious Achiuwa 4.0 6.6 1.3 0.6
Bam Adebayo 8.1 10.4 3.9 1.1
Ochai Agbaji 1.8 2.8 1.1 0.6
Santi Aldama 4.6 5.8 2.3 0.7
Nickeil Alexander-Walker 1.6 2.0 2.5 0.8
blocks turnovers personal fouls ppg
Precious Achiuwa 0.9 1.1 1.9 7.6
Bam Adebayo 0.9 2.3 2.2 19.3
Ochai Agbaji 0.6 0.8 1.5 5.8
Santi Aldama 0.9 1.1 1.5 10.7
Nickeil Alexander-Walker 0.5 0.9 1.7 8.0
position age team games played games started \
Thaddeus Young PF 35 TOR 33 6
Trae Young PG 25 ATL 54 54
Omer Yurtseven C 25 UTA 48 12
Cody Zeller C 31 NOP 43 0
Ivica Zubac C 26 LAC 68 68
minutes played per game field goals field goal attempts \
Thaddeus Young 13.3 2.0 3.3
Trae Young 36.0 8.0 18.7
Omer Yurtseven 11.4 2.1 3.8
Cody Zeller 7.4 0.6 1.4
Ivica Zubac 26.4 5.0 7.6
fg percentage 3pt per game 3pt attempts 3pt percentage \
Thaddeus Young .602 0.0 0.2 .143
Trae Young .430 3.2 8.7 .373
Omer Yurtseven .538 0.1 0.5 .208
Cody Zeller .419 0.0 0.1 .333
Ivica Zubac .649 0.0 0.0
2pt per game 2pt attempts 2pt percentage \
Thaddeus Young 1.9 3.1 .634
Trae Young 4.8 10.0 .479
Omer Yurtseven 2.0 3.3 .588
Cody Zeller 0.6 1.4 .424
Ivica Zubac 5.0 7.6 .649
effective fg percentage free throws free throw attempts \
Thaddeus Young .606 0.2 0.5
Trae Young .516 6.4 7.5
Omer Yurtseven .552 0.4 0.6
Cody Zeller .427 0.5 0.9
Ivica Zubac .649 1.8 2.4
free throw percentage offensive rebounds defensive rebounds \
Thaddeus Young .400 1.4 1.7
Trae Young .855 0.4 2.3
Omer Yurtseven .679 1.5 2.8
Cody Zeller .605 1.1 1.5
Ivica Zubac .723 2.9 6.3
total rebounds assists steals blocks turnovers personal fouls \
Thaddeus Young 3.1 1.7 0.7 0.2 0.5 1.5
Trae Young 2.8 10.8 1.3 0.2 4.4 2.0
Omer Yurtseven 4.3 0.6 0.2 0.4 0.8 1.1
Cody Zeller 2.6 0.9 0.2 0.1 0.4 1.0
Ivica Zubac 9.2 1.4 0.3 1.2 1.2 2.6
ppg
Thaddeus Young 4.2
Trae Young 25.7
Omer Yurtseven 4.6
Cody Zeller 1.8
Ivica Zubac 11.7
2022-23 SEASON
position age team games played games started \
Precious Achiuwa C 23 TOR 55 12
Steven Adams C 29 MEM 42 42
Bam Adebayo C 25 MIA 75 75
Ochai Agbaji SG 22 UTA 59 22
Santi Aldama PF 22 MEM 77 20
minutes played per game field goals field goal attempts \
Precious Achiuwa 20.7 3.6 7.3
Steven Adams 27.0 3.7 6.3
Bam Adebayo 34.6 8.0 14.9
Ochai Agbaji 20.5 2.8 6.5
Santi Aldama 21.8 3.2 6.8
fg percentage 3pt per game 3pt attempts 3pt percentage \
Precious Achiuwa .485 0.5 2.0 .269
Steven Adams .597 0.0 0.0 .000
Bam Adebayo .540 0.0 0.2 .083
Ochai Agbaji .427 1.4 3.9 .355
Santi Aldama .470 1.2 3.5 .353
2pt per game 2pt attempts 2pt percentage \
Precious Achiuwa 3.0 5.4 .564
Steven Adams 3.7 6.2 .599
Bam Adebayo 8.0 14.7 .545
Ochai Agbaji 1.4 2.7 .532
Santi Aldama 2.0 3.4 .591
effective fg percentage free throws free throw attempts \
Precious Achiuwa .521 1.6 2.3
Steven Adams .597 1.1 3.1
Bam Adebayo .541 4.3 5.4
Ochai Agbaji .532 0.9 1.2
Santi Aldama .560 1.4 1.9
free throw percentage offensive rebounds defensive rebounds \
Precious Achiuwa .702 1.8 4.1
Steven Adams .364 5.1 6.5
Bam Adebayo .806 2.5 6.7
Ochai Agbaji .812 0.7 1.3
Santi Aldama .750 1.1 3.7
total rebounds assists steals blocks turnovers \
Precious Achiuwa 6.0 0.9 0.6 0.5 1.1
Steven Adams 11.5 2.3 0.9 1.1 1.9
Bam Adebayo 9.2 3.2 1.2 0.8 2.5
Ochai Agbaji 2.1 1.1 0.3 0.3 0.7
Santi Aldama 4.8 1.3 0.6 0.6 0.8
personal fouls ppg
Precious Achiuwa 1.9 9.2
Steven Adams 2.3 8.6
Bam Adebayo 2.8 20.4
Ochai Agbaji 1.7 7.9
Santi Aldama 1.9 9.0
position age team games played games started \
Thaddeus Young PF 34 TOR 54 9
Trae Young PG 24 ATL 73 73
Omer Yurtseven C 24 MIA 9 0
Cody Zeller C 30 MIA 15 2
Ivica Zubac C 25 LAC 76 76
minutes played per game field goals field goal attempts \
Thaddeus Young 14.7 2.0 3.7
Trae Young 34.8 8.2 19.0
Omer Yurtseven 9.2 1.8 3.0
Cody Zeller 14.5 2.5 3.9
Ivica Zubac 28.6 4.3 6.8
fg percentage 3pt per game 3pt attempts 3pt percentage \
Thaddeus Young .545 0.1 0.6 .176
Trae Young .429 2.1 6.3 .335
Omer Yurtseven .593 0.3 0.8 .429
Cody Zeller .627 0.0 0.1 .000
Ivica Zubac .634 0.0 0.0 .000
2pt per game 2pt attempts 2pt percentage \
Thaddeus Young 1.9 3.0 .622
Trae Young 6.1 12.7 .476
Omer Yurtseven 1.4 2.2 .650
Cody Zeller 2.5 3.8 .649
Ivica Zubac 4.3 6.7 .637
effective fg percentage free throws free throw attempts \
Thaddeus Young .561 0.3 0.5
Trae Young .485 7.8 8.8
Omer Yurtseven .648 0.6 0.7
Cody Zeller .627 1.6 2.3
Ivica Zubac .634 2.2 3.1
free throw percentage offensive rebounds defensive rebounds \
Thaddeus Young .692 1.3 1.8
Trae Young .886 0.8 2.2
Omer Yurtseven .833 0.9 1.7
Cody Zeller .686 1.7 2.6
Ivica Zubac .697 3.1 6.8
total rebounds assists steals blocks turnovers personal fouls \
Thaddeus Young 3.1 1.4 1.0 0.1 0.8 1.6
Trae Young 3.0 10.2 1.1 0.1 4.1 1.4
Omer Yurtseven 2.6 0.2 0.2 0.2 0.4 1.8
Cody Zeller 4.3 0.7 0.2 0.3 0.9 2.2
Ivica Zubac 9.9 1.0 0.4 1.3 1.5 2.9
ppg
Thaddeus Young 4.4
Trae Young 26.2
Omer Yurtseven 4.4
Cody Zeller 6.5
Ivica Zubac 10.8
2021-22 SEASON
position age team games played games started \
Precious Achiuwa C 22 TOR 73 28
Steven Adams C 28 MEM 76 75
Bam Adebayo C 24 MIA 56 56
Santi Aldama PF 21 MEM 32 0
LaMarcus Aldridge C 36 BRK 47 12
minutes played per game field goals field goal attempts \
Precious Achiuwa 23.6 3.6 8.3
Steven Adams 26.3 2.8 5.1
Bam Adebayo 32.6 7.3 13.0
Santi Aldama 11.3 1.7 4.1
LaMarcus Aldridge 22.3 5.4 9.7
fg percentage 3pt per game 3pt attempts 3pt percentage \
Precious Achiuwa .439 0.8 2.1 .359
Steven Adams .547 0.0 0.0 .000
Bam Adebayo .557 0.0 0.1 .000
Santi Aldama .402 0.2 1.5 .125
LaMarcus Aldridge .550 0.3 1.0 .304
2pt per game 2pt attempts 2pt percentage \
Precious Achiuwa 2.9 6.1 .468
Steven Adams 2.8 5.0 .548
Bam Adebayo 7.3 12.9 .562
Santi Aldama 1.5 2.6 .560
LaMarcus Aldridge 5.1 8.8 .578
effective fg percentage free throws free throw attempts \
Precious Achiuwa .486 1.1 1.8
Steven Adams .547 1.4 2.6
Bam Adebayo .557 4.6 6.1
Santi Aldama .424 0.6 1.0
LaMarcus Aldridge .566 1.9 2.2
free throw percentage offensive rebounds defensive rebounds \
Precious Achiuwa .595 2.0 4.5
Steven Adams .543 4.6 5.4
Bam Adebayo .753 2.4 7.6
Santi Aldama .625 1.0 1.7
LaMarcus Aldridge .873 1.6 3.9
total rebounds assists steals blocks turnovers \
Precious Achiuwa 6.5 1.1 0.5 0.6 1.2
Steven Adams 10.0 3.4 0.9 0.8 1.5
Bam Adebayo 10.1 3.4 1.4 0.8 2.6
Santi Aldama 2.7 0.7 0.2 0.3 0.5
LaMarcus Aldridge 5.5 0.9 0.3 1.0 0.9
personal fouls ppg
Precious Achiuwa 2.1 9.1
Steven Adams 2.0 6.9
Bam Adebayo 3.1 19.1
Santi Aldama 1.1 4.1
LaMarcus Aldridge 1.7 12.9
position age team games played games started \
Thaddeus Young PF 33 SAS 52 1
Trae Young PG 23 ATL 76 76
Omer Yurtseven C 23 MIA 56 12
Cody Zeller C 29 POR 27 0
Ivica Zubac C 24 LAC 76 76
minutes played per game field goals field goal attempts \
Thaddeus Young 16.3 2.7 5.2
Trae Young 34.9 9.4 20.3
Omer Yurtseven 12.6 2.3 4.4
Cody Zeller 13.1 1.9 3.3
Ivica Zubac 24.4 4.1 6.5
fg percentage 3pt per game 3pt attempts 3pt percentage \
Thaddeus Young .518 0.3 0.9 .354
Trae Young .460 3.1 8.0 .382
Omer Yurtseven .526 0.0 0.2 .091
Cody Zeller .567 0.0 0.1 .000
Ivica Zubac .626 0.0 0.0
2pt per game 2pt attempts 2pt percentage \
Thaddeus Young 2.4 4.3 .554
Trae Young 6.3 12.3 .512
Omer Yurtseven 2.3 4.2 .547
Cody Zeller 1.9 3.2 .593
Ivica Zubac 4.1 6.5 .626
effective fg percentage free throws free throw attempts \
Thaddeus Young .550 0.4 0.9
Trae Young .536 6.6 7.3
Omer Yurtseven .528 0.7 1.1
Cody Zeller .567 1.4 1.8
Ivica Zubac .626 2.2 3.0
free throw percentage offensive rebounds defensive rebounds \
Thaddeus Young .469 1.5 2.5
Trae Young .904 0.7 3.1
Omer Yurtseven .623 1.5 3.7
Cody Zeller .776 1.9 2.8
Ivica Zubac .727 2.9 5.6
total rebounds assists steals blocks turnovers personal fouls \
Thaddeus Young 4.0 2.0 1.0 0.3 1.0 1.6
Trae Young 3.7 9.7 0.9 0.1 4.0 1.7
Omer Yurtseven 5.3 0.9 0.3 0.4 0.7 1.5
Cody Zeller 4.6 0.8 0.3 0.2 0.7 2.1
Ivica Zubac 8.5 1.6 0.5 1.0 1.5 2.7
ppg
Thaddeus Young 6.2
Trae Young 28.4
Omer Yurtseven 5.3
Cody Zeller 5.2
Ivica Zubac 10.3
2020-21 SEASON
position age team games played games started \
Precious Achiuwa PF 21 MIA 61 4
Jaylen Adams PG 24 MIL 7 0
Steven Adams C 27 NOP 58 58
Bam Adebayo C 23 MIA 64 64
LaMarcus Aldridge C 35 SAS 26 23
minutes played per game field goals field goal attempts \
Precious Achiuwa 12.1 2.0 3.7
Jaylen Adams 2.6 0.1 1.1
Steven Adams 27.7 3.3 5.3
Bam Adebayo 33.5 7.1 12.5
LaMarcus Aldridge 25.9 5.4 11.4
fg percentage 3pt per game 3pt attempts 3pt percentage \
Precious Achiuwa .544 0.0 0.0 .000
Jaylen Adams .125 0.0 0.3 .000
Steven Adams .614 0.0 0.1 .000
Bam Adebayo .570 0.0 0.1 .250
LaMarcus Aldridge .473 1.2 3.1 .388
2pt per game 2pt attempts 2pt percentage \
Precious Achiuwa 2.0 3.7 .546
Jaylen Adams 0.1 0.9 .167
Steven Adams 3.3 5.3 .620
Bam Adebayo 7.1 12.4 .573
LaMarcus Aldridge 4.2 8.3 .505
effective fg percentage free throws free throw attempts \
Precious Achiuwa .544 0.9 1.8
Jaylen Adams .125 0.0 0.0
Steven Adams .614 1.0 2.3
Bam Adebayo .571 4.4 5.5
LaMarcus Aldridge .525 1.6 1.8
free throw percentage offensive rebounds defensive rebounds \
Precious Achiuwa .509 1.2 2.2
Jaylen Adams 0.0 0.4
Steven Adams .444 3.7 5.2
Bam Adebayo .799 2.2 6.7
LaMarcus Aldridge .872 0.7 3.8
total rebounds assists steals blocks turnovers \
Precious Achiuwa 3.4 0.5 0.3 0.5 0.7
Jaylen Adams 0.4 0.3 0.0 0.0 0.0
Steven Adams 8.9 1.9 0.9 0.7 1.3
Bam Adebayo 9.0 5.4 1.2 1.0 2.6
LaMarcus Aldridge 4.5 1.9 0.4 1.1 1.0
personal fouls ppg
Precious Achiuwa 1.5 5.0
Jaylen Adams 0.1 0.3
Steven Adams 1.9 7.6
Bam Adebayo 2.3 18.7
LaMarcus Aldridge 1.8 13.5
position age team games played games started \
Delon Wright PG 28 DET 63 39
Thaddeus Young PF 32 CHI 68 23
Trae Young PG 22 ATL 63 63
Cody Zeller C 28 CHO 48 21
Ivica Zubac C 23 LAC 72 33
minutes played per game field goals field goal attempts \
Delon Wright 27.7 3.8 8.2
Thaddeus Young 24.3 5.4 9.7
Trae Young 33.7 7.7 17.7
Cody Zeller 20.9 3.8 6.8
Ivica Zubac 22.3 3.6 5.5
fg percentage 3pt per game 3pt attempts 3pt percentage \
Delon Wright .463 1.0 2.7 .372
Thaddeus Young .559 0.2 0.7 .267
Trae Young .438 2.2 6.3 .343
Cody Zeller .559 0.1 0.6 .143
Ivica Zubac .652 0.0 0.1 .250
2pt per game 2pt attempts 2pt percentage \
Delon Wright 2.8 5.5 .509
Thaddeus Young 5.3 9.1 .580
Trae Young 5.6 11.3 .491
Cody Zeller 3.7 6.2 .598
Ivica Zubac 3.6 5.4 .656
effective fg percentage free throws free throw attempts \
Delon Wright .525 1.6 2.0
Thaddeus Young .568 1.0 1.7
Trae Young .499 7.7 8.7
Cody Zeller .565 1.8 2.5
Ivica Zubac .654 1.9 2.4
free throw percentage offensive rebounds defensive rebounds \
Delon Wright .802 1.0 3.2
Thaddeus Young .628 2.5 3.8
Trae Young .886 0.6 3.3
Cody Zeller .714 2.5 4.4
Ivica Zubac .789 2.6 4.6
total rebounds assists steals blocks turnovers personal fouls \
Delon Wright 4.3 4.4 1.6 0.5 1.3 1.2
Thaddeus Young 6.2 4.3 1.1 0.6 2.0 2.2
Trae Young 3.9 9.4 0.8 0.2 4.1 1.8
Cody Zeller 6.8 1.8 0.6 0.4 1.1 2.5
Ivica Zubac 7.2 1.3 0.3 0.9 1.1 2.6
ppg
Delon Wright 10.2
Thaddeus Young 12.1
Trae Young 25.3
Cody Zeller 9.4
Ivica Zubac 9.0
2019-20 SEASON
position age team games played games started \
Steven Adams C 26 OKC 63 63
Bam Adebayo PF 22 MIA 72 72
LaMarcus Aldridge C 34 SAS 53 53
Kyle Alexander C 23 MIA 2 0
Nickeil Alexander-Walker SG 21 NOP 47 1
minutes played per game field goals \
Steven Adams 26.7 4.5
Bam Adebayo 33.6 6.1
LaMarcus Aldridge 33.1 7.4
Kyle Alexander 6.5 0.5
Nickeil Alexander-Walker 12.6 2.1
field goal attempts fg percentage 3pt per game \
Steven Adams 7.6 .592 0.0
Bam Adebayo 11.0 .557 0.0
LaMarcus Aldridge 15.0 .493 1.2
Kyle Alexander 1.0 .500 0.0
Nickeil Alexander-Walker 5.7 .368 1.0
3pt attempts 3pt percentage 2pt per game \
Steven Adams 0.0 .333 4.5
Bam Adebayo 0.2 .143 6.1
LaMarcus Aldridge 3.0 .389 6.2
Kyle Alexander 0.0 0.5
Nickeil Alexander-Walker 2.8 .346 1.1
2pt attempts 2pt percentage effective fg percentage \
Steven Adams 7.5 .594 .593
Bam Adebayo 10.8 .564 .558
LaMarcus Aldridge 12.0 .519 .532
Kyle Alexander 1.0 .500 .500
Nickeil Alexander-Walker 2.8 .391 .455
free throws free throw attempts \
Steven Adams 1.9 3.2
Bam Adebayo 3.7 5.3
LaMarcus Aldridge 3.0 3.6
Kyle Alexander 0.0 0.0
Nickeil Alexander-Walker 0.5 0.8
free throw percentage offensive rebounds \
Steven Adams .582 3.3
Bam Adebayo .691 2.4
LaMarcus Aldridge .827 1.9
Kyle Alexander 1.0
Nickeil Alexander-Walker .676 0.2
defensive rebounds total rebounds assists steals \
Steven Adams 6.0 9.3 2.3 0.8
Bam Adebayo 7.8 10.2 5.1 1.1
LaMarcus Aldridge 5.5 7.4 2.4 0.7
Kyle Alexander 0.5 1.5 0.0 0.0
Nickeil Alexander-Walker 1.6 1.8 1.9 0.4
blocks turnovers personal fouls ppg
Steven Adams 1.1 1.5 1.9 10.9
Bam Adebayo 1.3 2.8 2.5 15.9
LaMarcus Aldridge 1.6 1.4 2.4 18.9
Kyle Alexander 0.0 0.5 0.5 1.0
Nickeil Alexander-Walker 0.2 1.1 1.2 5.7
position age team games played games started \
Trae Young PG 21 ATL 60 60
Cody Zeller C 27 CHO 58 39
Tyler Zeller C 30 SAS 2 0
Ante Žižić C 23 CLE 22 0
Ivica Zubac C 22 LAC 72 70
minutes played per game field goals field goal attempts \
Trae Young 35.3 9.1 20.8
Cody Zeller 23.1 4.3 8.3
Tyler Zeller 2.0 0.5 2.0
Ante Žižić 10.0 1.9 3.3
Ivica Zubac 18.4 3.3 5.3
fg percentage 3pt per game 3pt attempts 3pt percentage \
Trae Young .437 3.4 9.5 .361
Cody Zeller .524 0.3 1.3 .240
Tyler Zeller .250 0.0 0.0
Ante Žižić .569 0.0 0.0
Ivica Zubac .613 0.0 0.0 .000
2pt per game 2pt attempts 2pt percentage effective fg percentage \
Trae Young 5.7 11.4 .501 .519
Cody Zeller 4.0 7.0 .577 .543
Tyler Zeller 0.5 2.0 .250 .250
Ante Žižić 1.9 3.3 .569 .569
Ivica Zubac 3.3 5.3 .616 .613
free throws free throw attempts free throw percentage \
Trae Young 8.0 9.3 .860
Cody Zeller 2.1 3.1 .682
Tyler Zeller 0.0 0.0
Ante Žižić 0.6 0.9 .737
Ivica Zubac 1.7 2.3 .747
offensive rebounds defensive rebounds total rebounds assists \
Trae Young 0.5 3.7 4.3 9.3
Cody Zeller 2.8 4.3 7.1 1.5
Tyler Zeller 1.5 0.5 2.0 0.0
Ante Žižić 0.8 2.2 3.0 0.3
Ivica Zubac 2.7 4.8 7.5 1.1
steals blocks turnovers personal fouls ppg
Trae Young 1.1 0.1 4.8 1.7 29.6
Cody Zeller 0.7 0.4 1.3 2.4 11.1
Tyler Zeller 0.0 0.0 0.0 0.0 1.0
Ante Žižić 0.3 0.2 0.5 1.2 4.4
Ivica Zubac 0.2 0.9 0.8 2.3 8.3
2018-19 SEASON
position age team games played games started \
Álex Abrines SG 25 OKC 31 2
Quincy Acy PF 28 PHO 10 0
Jaylen Adams PG 22 ATL 34 1
Steven Adams C 25 OKC 80 80
Bam Adebayo C 21 MIA 82 28
minutes played per game field goals field goal attempts \
Álex Abrines 19.0 1.8 5.1
Quincy Acy 12.3 0.4 1.8
Jaylen Adams 12.6 1.1 3.2
Steven Adams 33.4 6.0 10.1
Bam Adebayo 23.3 3.4 5.9
fg percentage 3pt per game 3pt attempts 3pt percentage \
Álex Abrines .357 1.3 4.1 .323
Quincy Acy .222 0.2 1.5 .133
Jaylen Adams .345 0.7 2.2 .338
Steven Adams .595 0.0 0.0 .000
Bam Adebayo .576 0.0 0.2 .200
2pt per game 2pt attempts 2pt percentage effective fg percentage \
Álex Abrines 0.5 1.0 .500 .487
Quincy Acy 0.2 0.3 .667 .278
Jaylen Adams 0.4 1.1 .361 .459
Steven Adams 6.0 10.1 .596 .595
Bam Adebayo 3.4 5.7 .588 .579
free throws free throw attempts free throw percentage \
Álex Abrines 0.4 0.4 .923
Quincy Acy 0.7 1.0 .700
Jaylen Adams 0.2 0.3 .778
Steven Adams 1.8 3.7 .500
Bam Adebayo 2.0 2.8 .735
offensive rebounds defensive rebounds total rebounds assists \
Álex Abrines 0.2 1.4 1.5 0.6
Quincy Acy 0.3 2.2 2.5 0.8
Jaylen Adams 0.3 1.4 1.8 1.9
Steven Adams 4.9 4.6 9.5 1.6
Bam Adebayo 2.0 5.3 7.3 2.2
steals blocks turnovers personal fouls ppg
Álex Abrines 0.5 0.2 0.5 1.7 5.3
Quincy Acy 0.1 0.4 0.4 2.4 1.7
Jaylen Adams 0.4 0.1 0.8 1.3 3.2
Steven Adams 1.5 1.0 1.7 2.6 13.9
Bam Adebayo 0.9 0.8 1.5 2.5 8.9
position age team games played games started \
Trae Young PG 20 ATL 81 81
Cody Zeller C 26 CHO 49 47
Tyler Zeller C 29 ATL 6 1
Ante Žižić C 22 CLE 59 25
Ivica Zubac C 21 LAL 59 37
minutes played per game field goals field goal attempts \
Trae Young 30.9 6.5 15.5
Cody Zeller 25.4 3.9 7.0
Tyler Zeller 15.5 2.7 5.0
Ante Žižić 18.3 3.1 5.6
Ivica Zubac 17.6 3.6 6.4
fg percentage 3pt per game 3pt attempts 3pt percentage \
Trae Young .418 1.9 6.0 .324
Cody Zeller .551 0.1 0.4 .273
Tyler Zeller .533 0.0 0.2 .000
Ante Žižić .553 0.0 0.0
Ivica Zubac .559 0.0 0.0
2pt per game 2pt attempts 2pt percentage effective fg percentage \
Trae Young 4.6 9.6 .477 .480
Cody Zeller 3.8 6.6 .570 .559
Tyler Zeller 2.7 4.8 .552 .533
Ante Žižić 3.1 5.6 .553 .553
Ivica Zubac 3.6 6.4 .559 .559
free throws free throw attempts free throw percentage \
Trae Young 4.2 5.1 .829
Cody Zeller 2.3 2.9 .787
Tyler Zeller 2.3 3.0 .778
Ante Žižić 1.6 2.2 .705
Ivica Zubac 1.7 2.1 .802
offensive rebounds defensive rebounds total rebounds assists \
Trae Young 0.8 2.9 3.7 8.1
Cody Zeller 2.2 4.6 6.8 2.1
Tyler Zeller 1.8 2.2 4.0 0.7
Ante Žižić 1.8 3.6 5.4 0.9
Ivica Zubac 1.9 4.2 6.1 1.1
steals blocks turnovers personal fouls ppg
Trae Young 0.9 0.2 3.8 1.7 19.1
Cody Zeller 0.8 0.8 1.3 3.3 10.1
Tyler Zeller 0.2 0.5 0.7 3.3 7.7
Ante Žižić 0.2 0.4 1.0 1.9 7.8
Ivica Zubac 0.2 0.9 1.2 2.3 8.9
2023-24 SEASON
Rank
Tyrese Maxey 1
Coby White 2
Alperen Sengun 3
Jalen Williams 4
Jalen Brunson 5
Rank
Grayson Allen 10
Duncan Robinson 10
Shai Gilgeous-Alexander 12
Devin Vassell 12
Aaron Nesmith 14
2022-23 SEASON
Rank
Lauri Markkanen 1
Shai Gilgeous-Alexander 2
Jalen Brunson 3
Mikal Bridges 4
Nic Claxton 5
Rank
Kevon Looney 8
Austin Reaves 10
Aaron Gordon 11
Jaren Jackson Jr. 11
Malik Monk 11
2021-22 SEASON
Rank
Ja Morant 1
Dejounte Murray 2
Darius Garland 3
Jordan Poole 4
Desmond Bane 5
Rank
Anfernee Simons 8
Robert Williams 9
Jaren Jackson Jr. 10
Jalen Brunson 11
Max Strus 12
2020-21 SEASON
Rank
Julius Randle 1
Jerami Grant 2
Michael Porter Jr. 3
Christian Wood 4
Zach LaVine 5
Rank
Shai Gilgeous-Alexander 19
Richaun Holmes 19
T.J. McConnell 19
Terry Rozier 19
Andrew Wiggins 19
2019-20 SEASON
Rank
Brandon Ingram 1
Bam Adebayo 2
Luka Dončić 3
Jayson Tatum 4
Devonte' Graham 5
Rank
Dāvis Bertāns 11
Jaylen Brown 11
Markelle Fultz 13
Spencer Dinwiddie 14
Duncan Robinson 14
Data Processing Second Part In this part of the data processing section, we focus on extracting the Most Improved Player (MIP) winners from each season's MIP rankings and storing them in a list. This helps us keep track of the players who have won the MIP award over the past five seasons. This step is crucial for our analysis, as it allows us to identify and compare these players' performance metrics against other players. This structured approach allows us to systematically verify and prepare the data for deeper exploration and modeling, setting the foundation for meaningful insights into player performance and the identification of future Most Improved Player candidates.
pastMipWinners = []
def getMipWinners(mipYear, mipArray):
mipArray.append(mipYear.index[0])
return mipArray
pastMipWinners = getMipWinners(mipdata1920, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2021, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2122, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2223, pastMipWinners)
pastMipWinners = getMipWinners(mipdata2324, pastMipWinners)
print(pastMipWinners)
['Brandon Ingram', 'Julius Randle', 'Ja Morant', 'Lauri Markkanen', 'Tyrese Maxey']
Create correlation graphs that would be useful to determining most improved player
- Pure Offensive stats to MIP winners (PPG, assists, O-rebounds, FG%)
- Pure Defensive stats to MIP winners (Steals, Blocks, D-Rebounds)
- Impact on game(Games played and minutes increase over seasons, FGA vs FGA)
- Prediction of the Most improved player for 2024-2025 season based on coefficients
past 5 Winners
- 2024 Tyrese Maxey PG Philadelphia 76ers .450 25.9 3.7 6.2 0.5
- 2023 Lauri Markkanen PF Utah Jazz .499 25.6 8.6 1.9 0.6
- 2022 Ja Morant PG Memphis Grizzlies .493 27.4 5.7 6.7 0.4
- 2021 Julius Randle PF New York Knicks .456 24.1 10.2 6.0 0.3
- 2020 Brandon Ingram F New Orleans Pelicans.463 23.8 6.1 4.2 0.6
Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of Points Per Game (PPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the point difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in Points per game is highlighted in green and the most improved player is highlighted in red.
# Any player that cannot stay in the league will have NaN values and thus will be dropped from the table
# Store the points assists and rebounds in a data frame and remove null string rows from data frame to get difference in all active players
pureOffensive1819 = data1819[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive1920 = data1920[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2021 = data2021[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2122 = data2122[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2223 = data2223[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
pureOffensive2324 = data2324[['ppg','assists','offensive rebounds','fg percentage']].replace('', np.nan)
# # Convert each of the stats to floats for arithmetics subtraction over seasons
pureOffensive1819 = pureOffensive1819.astype(float) # 1920 season
pureOffensive1920 = pureOffensive1920.astype(float)
pureOffensive2021 = pureOffensive2021.astype(float)
pureOffensive2122 = pureOffensive2122.astype(float)
pureOffensive2223 = pureOffensive2223.astype(float)
pureOffensive2324 = pureOffensive2324.astype(float)
# Subtracts from the 19/20 season data so growth and decline is noted in the data frame
pureOffensive1820 = pureOffensive1920.sub(pureOffensive1819)
pureOffensive1921 = pureOffensive2021.sub(pureOffensive1920)
pureOffensive2022 = pureOffensive2122.sub(pureOffensive2021)
pureOffensive2123 = pureOffensive2223.sub(pureOffensive2122)
pureOffensive2224 = pureOffensive2324.sub(pureOffensive2223)
# print(pureOffensive1820)
allOffense = [pureOffensive1820, pureOffensive1921, pureOffensive2022, pureOffensive2123, pureOffensive2224]
yc = 0
years = ["19-20", "20-21", "21-22", "22-23", "23-24"]
def get_ppg_diff(offense, mip, year):
offense.dropna(inplace=True)
# Sort the DataFrame by the difference in PPG
sorted_growth_df = offense.sort_values(by='ppg', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['ppg'], color='skyblue')
plt.title(f'Difference in Points Per Game (PPG) in {year} from last season')
plt.xlabel('Difference in PPG')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'ppg']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest PPG growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in offense.index:
award_player_difference = offense.loc[mip, 'ppg']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
for offense in allOffense:
mip = pastMipWinners[yc]
get_ppg_diff(offense, mip, years[yc])
yc+=1
print(pureOffensive1820)
ppg assists offensive rebounds fg percentage Aaron Gordon -1.6 0.0 0.0 -0.012 Aaron Holiday 3.6 1.7 0.2 0.013 Abdel Nader 2.3 0.4 0.1 0.045 Al Horford -1.7 -0.2 -0.3 -0.085 Al-Farouq Aminu -5.1 -0.1 -0.1 -0.142 Alec Burks 6.2 0.9 0.2 0.013 Alex Caruso -3.7 -1.2 -0.5 -0.033 Alex Len -3.1 -0.2 -0.3 0.061 Alfonzo McKinnie -0.1 0.0 -0.2 -0.060 Alize Johnson 1.1 0.3 0.6 0.164 Allen Crabbe -5.0 -0.2 -0.1 -0.011 Allonzo Trier -4.4 -0.7 -0.2 0.033 Amile Jefferson -1.5 -0.1 0.1 -0.268 Andre Drummond 0.4 1.3 -1.0 0.000 Andre Iguodala -1.1 -0.8 0.1 -0.068 Andrew Wiggins 3.7 1.2 0.1 0.035 Anfernee Simons 4.5 0.7 0.2 -0.045 Ante Žižić -3.4 -0.6 -1.0 0.016 Anthony Davis 0.2 -0.7 -0.8 -0.014 Anthony Tolliver -1.4 0.1 0.4 -0.025 Aron Baynes 5.9 0.5 0.0 0.009 Austin Rivers 0.7 -0.5 0.1 0.015 Avery Bradley -1.3 -1.1 -0.3 0.036 B.J. Johnson -0.3 0.3 0.0 -0.219 Bam Adebayo 7.0 2.9 0.4 -0.019 Ben McLemore 6.2 0.6 0.1 0.053 Ben Simmons -0.5 0.3 -0.2 0.017 Bismack Biyombo 3.0 0.3 0.8 -0.028 Blake Griffin -9.0 -2.1 -0.4 -0.110 Boban Marjanović -0.7 -0.4 0.0 -0.042 Bobby Portis -4.1 0.1 -1.0 0.006 Bogdan Bogdanović 1.0 -0.4 -0.2 0.022 Bojan Bogdanović 2.2 0.1 0.2 -0.050 Brad Wanamaker 3.0 0.9 0.2 -0.028 Bradley Beal 4.9 0.6 -0.2 -0.020 Brandon Goodwin 4.7 0.6 0.2 0.139 Brandon Ingram 5.5 1.2 0.0 -0.034 Brandon Knight 0.5 1.0 -0.1 -0.028 Brook Lopez -0.5 0.3 0.5 -0.017 Bruce Brown 4.6 2.8 0.5 0.045 Bruno Caboclo -5.3 -1.1 -0.4 0.000 Bryn Forbes -0.6 -0.4 0.0 -0.039 Buddy Hield -1.5 0.5 -0.5 -0.029 C.J. Miles 0.0 0.5 0.2 -0.038 CJ McCollum 1.2 1.4 -0.2 -0.008 Caleb Swanigan 0.4 0.6 0.2 0.256 Cameron Payne 4.6 0.3 0.2 0.055 Caris LeVert 5.0 0.5 0.2 -0.004 Carmelo Anthony 2.0 1.0 0.3 0.025 Cedi Osman -2.0 -0.2 0.0 0.010 Chandler Hutchison 2.6 0.1 -0.1 -0.002 Chandler Parsons -4.7 -1.1 0.0 -0.096 Chasson Randle -3.8 -0.3 -0.2 -0.419 Cheick Diallo -1.3 0.0 -0.6 0.028 Chimezie Metu 1.4 0.2 0.4 0.243 Chris Boucher 3.3 0.3 1.1 0.025 Chris Chiozza 4.2 2.4 0.2 0.143 Chris Paul 2.0 -1.5 -0.2 0.070 Christian Wood 4.9 0.6 0.9 0.046 Clint Capela -2.7 -0.2 -0.1 -0.019 Cody Zeller 1.0 -0.6 0.6 -0.027 Collin Sexton 4.1 0.0 0.2 0.042 Corey Brewer -3.9 -0.9 -0.4 0.069 Cory Joseph -0.1 -0.4 0.1 0.003 Courtney Lee 0.5 -0.6 0.0 0.077 Cristiano Felício -0.1 0.1 1.2 0.099 D'Angelo Russell 2.0 -0.7 -0.3 -0.008 D.J. Augustin -1.2 -0.7 -0.1 -0.071 D.J. Wilson -2.2 -0.4 -0.6 -0.020 Damian Jones 0.2 -0.6 0.0 -0.036 Damian Lillard 4.2 1.1 -0.4 0.019 Damion Lee 7.8 2.3 0.4 -0.024 Damyean Dotson -4.0 -0.6 -0.3 -0.001 Daniel Theis 3.5 0.7 0.9 0.017 Danilo Gallinari -1.1 -0.7 -0.3 -0.025 Danny Green -2.3 -0.3 0.0 -0.049 Dante Exum -2.4 -1.5 -0.1 0.052 Danuel House Jr. 1.1 0.3 0.3 -0.041 Dario Šarić 0.1 0.3 -0.1 0.039 Daryl Macon -2.8 -0.6 -0.3 -0.037 David Nwaba -1.3 -0.7 -0.4 0.040 De'Aaron Fox 3.8 -0.5 0.2 0.022 De'Anthony Melton 2.6 -0.3 0.2 0.010 DeAndre Jordan -2.7 -0.4 -0.8 0.025 DeAndre' Bembry -2.6 -0.6 0.1 0.010 DeMar DeRozan 0.9 -0.6 -0.1 0.050 DeMarre Carroll -7.5 -0.3 -0.5 -0.016 Deandre Ayton 1.9 0.1 0.8 -0.039 Delon Wright -1.8 0.0 0.1 0.028 Dennis Schröder 3.4 -0.1 -0.2 0.055 Dennis Smith Jr. -8.1 -1.9 0.0 -0.087 Deonte Burton 0.1 0.1 0.1 -0.058 Derrick Favors -2.8 0.4 0.5 0.031 Derrick Jones Jr. 1.5 0.5 -0.5 0.033 Derrick Rose 0.1 1.3 -0.1 0.008 Derrick White 1.4 -0.4 0.0 -0.021 Devin Booker 0.0 -0.3 -0.2 0.022 Devonte' Graham 13.5 4.9 0.5 0.039 Dewayne Dedmon -5.0 -0.9 -0.2 -0.092 Dillon Brooks 8.7 1.2 0.4 0.005 Dion Waiters -0.9 -0.8 0.0 0.000 Domantas Sabonis 4.4 2.1 0.5 -0.050 Donovan Mitchell 0.2 0.1 0.0 0.017 Donte DiVincenzo 4.3 1.2 0.4 0.052 Dorian Finney-Smith 2.0 0.4 0.3 0.034 Doug McDermott 3.0 0.2 0.2 -0.003 Dragan Bender 1.7 0.6 0.3 -0.001 Draymond Green 0.6 -0.7 -0.4 -0.056 Drew Eubanks 3.1 0.4 1.0 0.065 Duncan Robinson 10.2 1.1 0.0 0.079 Dusty Hannahs 2.0 -2.5 0.0 0.194 Dwayne Bacon -1.6 0.2 0.2 -0.127 Dwight Howard -5.3 0.3 -0.2 0.106 Dwight Powell -1.2 0.0 0.1 0.041 Dāvis Bertāns 7.4 0.4 0.3 -0.016 Džanan Musa 2.7 0.9 0.4 -0.037 E'Twaun Moore -3.6 -0.5 -0.1 -0.055 Ed Davis -4.0 -0.4 -1.4 -0.138 Edmond Sumner 2.0 1.4 0.0 0.086 Elfrid Payton -0.6 -0.4 0.0 0.005 Elie Okobo -1.7 -0.3 0.1 0.005 Emmanuel Mudiay -7.5 -1.8 -0.3 0.016 Enes Freedom -5.6 -0.7 -1.0 0.023 Eric Bledsoe -1.0 -0.1 -0.4 -0.009 Eric Gordon -1.8 -0.4 0.0 -0.040 Ersan İlyasova -0.2 0.0 -0.4 0.028 Evan Fournier 3.4 -0.4 -0.2 0.029 Evan Turner -3.5 -1.9 -0.1 -0.087 Frank Jackson -1.8 -0.1 0.0 -0.029 Frank Kaminsky 1.1 0.6 0.1 -0.013 Frank Mason III 1.8 1.0 0.4 0.031 Frank Ntilikina 0.6 0.2 0.1 0.056 Fred VanVleet 6.6 1.8 0.0 0.003 Furkan Korkmaz 4.0 0.0 0.0 0.030 Garrett Temple 2.5 1.1 0.1 -0.044 Gary Clark 0.8 0.0 0.4 0.075 Gary Harris -2.5 -0.1 -0.2 -0.004 Gary Payton II 0.2 0.4 0.6 -0.211 Gary Trent Jr. 6.2 0.7 0.3 0.124 George Hill 1.8 0.8 0.1 0.064 Georges Niang 1.9 0.1 0.0 -0.037 Giannis Antetokounmpo 1.8 -0.3 0.0 -0.025 Glenn Robinson III 7.5 1.1 0.9 0.066 Goran Dragić 2.5 0.3 -0.1 0.028 Gordon Hayward 6.0 0.7 0.4 0.034 Gorgui Dieng 1.0 0.3 0.3 -0.045 Grayson Allen 3.1 0.7 0.1 0.090 Hamidou Diallo 3.2 0.5 0.3 -0.009 Harrison Barnes -1.9 0.7 0.4 0.040 Harry Giles -0.1 -0.2 -0.2 0.051 Hassan Whiteside 3.2 0.4 0.3 0.050 Henry Ellenson -5.6 -0.6 0.1 -0.268 Ian Mahinmi 3.3 0.6 0.7 0.043 Iman Shumpert -3.3 -0.9 0.3 -0.046 Isaac Bonga 4.1 0.5 0.7 0.352 Isaiah Hartenstein 2.8 0.3 0.5 0.169 Isaiah Thomas 4.1 1.8 -0.1 0.065 Ish Smith 2.0 1.3 0.0 0.028 Ivica Zubac -0.6 0.0 0.8 0.054 J.J. Barea -3.2 -1.7 0.0 -0.007 J.R. Smith -3.9 -1.4 0.0 -0.024 JJ Redick -2.8 -0.7 -0.1 0.013 JaKarr Sampson -15.4 -0.4 -0.7 0.054 JaMychal Green -2.6 0.0 -0.4 -0.054 JaVale McGee -5.4 -0.2 -0.8 0.013 Jabari Parker -0.5 -0.6 0.4 0.017 Jacob Evans 3.1 0.3 0.1 -0.004 Jae Crowder -1.4 0.8 0.0 0.002 Jahlil Okafor -0.1 0.5 0.2 0.037 Jake Layman 1.5 0.0 -0.1 -0.056 Jakob Poeltl 0.1 0.6 -0.3 -0.021 Jalen Brunson -1.1 0.1 0.1 -0.001 Jamal Crawford -2.9 -0.6 -0.1 0.103 Jamal Murray 0.3 0.0 -0.1 0.019 James Ennis III -0.1 0.2 0.0 -0.023 James Harden -1.8 0.0 0.2 0.002 James Johnson 0.6 -0.2 0.4 0.046 Jared Dudley -3.4 -0.8 -0.5 -0.023 Jaren Jackson Jr. 3.6 0.3 -0.3 -0.037 Jarred Vanderbilt -0.3 0.0 -0.1 0.151 Jarrett Allen 0.2 0.2 0.7 0.059 Jaylen Brown 7.3 0.7 0.2 0.016 Jayson Tatum 7.7 0.9 0.1 0.000 Jeff Green -2.9 -0.8 -0.2 -0.013 Jeff Teague -1.2 -3.0 0.1 0.013 Jerami Grant -1.6 0.2 -0.4 -0.019 Jeremy Lamb -2.8 -0.1 -0.3 0.011 Jerian Grant 0.3 -1.1 0.0 -0.048 Jerome Robinson 1.7 0.8 0.1 -0.029 Jevon Carter 0.5 -0.4 0.1 0.113 Jimmy Butler 1.2 2.0 -0.1 -0.007 Joakim Noah -4.3 -0.7 -0.4 -0.016 Joe Chealey -1.5 -0.7 0.0 -0.333 Joe Harris 0.8 -0.3 0.2 -0.014 Joe Ingles -2.3 -0.5 0.0 -0.003 Joel Embiid -4.5 -0.7 0.3 -0.007 John Collins 2.1 -0.5 -0.8 0.023 John Henson -0.1 0.4 0.1 0.093 Johnathan Motley -2.4 0.1 -0.6 0.199 Johnathan Williams -3.5 0.0 -0.4 -0.032 Jonah Bolden -3.3 -0.9 -0.7 -0.130 Jonas Valančiūnas -0.7 0.5 0.8 0.026 Jonathan Isaac 2.3 0.3 0.4 0.041 Jordan Bell -0.1 -0.5 0.2 0.006 Jordan Clarkson -1.6 -0.5 -0.3 0.006 Jordan McRae 5.6 1.4 0.3 -0.062 Josh Hart 2.3 0.3 0.4 0.016 Josh Jackson -2.5 -0.7 -0.3 0.027 Josh Okogie 0.9 0.4 0.8 0.041 Josh Richardson -2.9 -1.2 0.0 0.018 Jrue Holiday -2.1 -1.0 0.2 -0.017 Juancho Hernangómez 0.2 0.0 0.0 -0.034 Julius Randle -1.9 0.0 0.2 -0.064 Justin Anderson -0.9 0.3 -0.4 -0.145 Justin Holiday -2.2 -0.5 -0.2 0.042 Justin Jackson -1.7 -0.4 -0.1 -0.051 Justin Patton 0.1 -0.6 -0.5 0.114 Justise Winslow -1.3 -0.3 0.5 -0.045 Jusuf Nurkić 2.0 0.8 -0.5 -0.013 Kadeem Allen -4.9 -1.9 -0.2 -0.029 Karl-Anthony Towns 2.1 1.0 -0.7 -0.010 Kawhi Leonard 0.5 1.6 -0.4 -0.026 Keita Bates-Diop 1.5 0.1 0.1 0.004 Kelly Olynyk -1.8 -0.1 -0.2 -0.001 Kelly Oubre Jr. 3.5 0.3 0.2 0.007 Kemba Walker -5.2 -1.1 0.0 -0.009 Kenrich Williams -2.6 -0.3 0.1 -0.037 Kent Bazemore -2.8 -0.9 -0.2 -0.027 Kentavious Caldwell-Pope -2.1 0.3 0.0 0.037 Kevin Huerter 2.5 0.9 -0.2 -0.006 Kevin Knox -6.4 -0.2 -0.4 -0.011 Kevin Love 0.6 1.0 -0.5 0.065 Kevon Looney -2.9 -0.5 -1.0 -0.258 Khem Birch -0.4 0.2 0.3 -0.093 Khris Middleton 2.6 0.0 0.1 0.056 Khyri Thomas -0.2 0.1 -0.1 -0.025 Kostas Antetokounmpo 0.4 0.4 0.4 1.000 Kris Dunn -4.0 -2.6 0.1 0.019 Kyle Anderson -2.2 -0.6 -0.2 -0.069 Kyle Korver -1.9 0.0 0.2 0.014 Kyle Kuzma -5.9 -1.2 0.0 -0.020 Kyle Lowry 5.2 -1.2 0.0 0.005 Kyle O'Quinn 0.0 0.6 0.6 -0.013 Kyrie Irving 3.6 -0.5 0.0 -0.009 LaMarcus Aldridge -2.4 0.0 -1.2 -0.026 Lance Thomas -1.1 0.3 -0.2 -0.048 Landry Shamet 0.2 0.4 -0.2 -0.027 Langston Galloway 1.9 0.4 -0.1 0.047 Larry Nance Jr. 0.7 -1.0 -0.6 0.011 Lauri Markkanen -4.0 0.1 -0.2 -0.005 LeBron James -2.1 1.9 0.0 -0.017 Lonnie Walker IV 3.8 0.6 0.4 0.078 Lonzo Ball 1.9 1.6 0.0 -0.003 Lou Williams -1.8 0.2 0.0 -0.007 Luc Mbah a Moute -3.3 -0.5 -0.2 -0.044 Luka Dončić 7.6 2.8 0.1 0.036 Luke Kennard 6.1 2.3 0.1 0.004 Luke Kornet -1.0 -0.3 0.0 0.061 Malcolm Brogdon 0.9 3.9 -0.1 -0.067 Malcolm Miller -2.2 0.3 0.0 -0.009 Malik Beasley -0.1 0.2 -0.1 -0.049 Malik Monk 1.4 0.5 0.3 0.047 Marc Gasol -6.1 -1.1 -0.3 -0.021 Marco Belinelli -4.2 -0.5 -0.1 -0.021 Marcus Morris 2.8 -0.1 0.0 -0.009 Marcus Smart 4.0 0.9 0.0 -0.047 Mario Hezonja -4.0 -0.6 0.1 0.010 Markelle Fultz 3.9 2.0 -0.8 0.046 Markieff Morris 0.3 -0.1 -0.4 0.022 Marquese Chriss 5.1 1.4 1.0 0.173 Marvin Bagley III -0.7 -0.2 -0.4 -0.037 Marvin Williams -4.2 -0.2 -0.5 0.024 Mason Plumlee -0.6 -0.5 -0.4 0.022 Matthew Dellavedova -2.8 -0.6 0.2 -0.051 Maurice Harkless -1.9 -0.1 -0.4 0.015 Maxi Kleber 2.3 0.2 0.2 0.008 Melvin Frazier 0.6 0.1 -0.2 0.108 Meyers Leonard 0.2 -0.1 -0.2 -0.036 Michael Carter-Williams 2.4 -0.1 0.3 0.053 Michael Kidd-Gilchrist -4.3 -0.4 -0.8 -0.143 Mikal Bridges 0.8 -0.3 0.2 0.080 Mike Conley -6.7 -2.0 0.1 -0.029 Mike Muscala -2.2 -0.3 -0.6 0.005 Mike Scott 0.2 0.0 0.5 0.026 Miles Bridges 5.5 0.6 0.6 -0.040 Mitchell Robinson 2.4 0.0 0.3 0.048 Mo Bamba -0.8 -0.1 0.2 -0.019 Monte Morris -1.4 -0.1 -0.1 -0.034 Montrezl Harrell 2.0 -0.3 0.4 -0.035 Moritz Wagner 3.9 0.6 0.8 0.130 Myles Turner -1.2 -0.4 0.0 -0.030 Naz Mitrou-Long 1.7 0.5 -0.1 0.053 Nemanja Bjelica 1.9 0.9 -0.1 0.002 Nerlens Noel 2.5 0.3 -0.1 0.097 Nicolas Batum -5.7 -0.3 0.2 -0.104 Nikola Jokić -0.2 -0.3 -0.6 0.017 Nikola Vučević -1.2 -0.2 -0.5 -0.041 Noah Vonleh -4.7 -1.1 -0.7 0.095 Norman Powell 7.4 0.3 0.2 0.012 OG Anunoby 3.6 0.9 0.3 0.052 Omari Spellman 1.7 0.0 0.0 0.029 Otto Porter Jr. -2.0 -0.3 -0.1 -0.022 P.J. Tucker -0.4 0.4 0.1 0.019 PJ Dozier 2.6 1.4 -0.7 0.033 Pascal Siakam 6.0 0.4 -0.5 -0.096 Pat Connaughton -1.5 -0.4 -0.1 -0.011 Patrick Beverley 0.3 -0.2 0.1 0.024 Patrick McCaw 2.0 1.1 0.3 0.001 Patrick Patterson 1.3 0.2 -0.1 0.034 Patty Mills 1.7 -1.2 0.0 0.006 Paul George -6.5 -0.2 -0.9 0.001 Paul Millsap -1.0 -0.4 -0.3 -0.002 Quinn Cook -1.8 -0.5 0.0 -0.040 Rajon Rondo -2.1 -3.0 -0.2 0.013 Raul Neto -0.2 -0.7 0.0 -0.005 Reggie Bullock -3.2 -0.6 0.1 -0.010 Reggie Jackson -3.5 -0.1 0.0 -0.010 Richaun Holmes 4.1 0.1 1.3 0.040 Ricky Rubio 0.3 2.7 0.2 0.011 Robert Covington -0.9 0.0 0.1 -0.009 Robert Williams 2.7 0.7 0.6 0.021 Robin Lopez -4.1 -0.5 -1.1 -0.076 Rodions Kurucs -3.9 0.3 -0.4 -0.004 Rodney Hood -0.2 -0.3 0.2 0.071 Rodney McGruder -4.3 -1.1 -0.4 -0.005 Rondae Hollis-Jefferson -1.9 0.2 0.4 0.060 Royce O'Neale 1.1 1.0 0.1 -0.042 Rudy Gay -2.9 -0.9 0.0 -0.058 Rudy Gobert -0.8 -0.5 -0.4 0.024 Russell Westbrook 4.3 -3.7 0.3 0.044 Ryan Anderson 0.0 0.2 -0.7 -0.018 Ryan Arcidiacono -2.2 -1.6 0.0 -0.038 Ryan Broekhoff 0.2 0.1 0.1 -0.079 Semi Ojeleye 0.1 0.1 0.0 -0.016 Serge Ibaka 0.4 0.1 0.0 -0.017 Seth Curry 4.5 1.0 0.0 0.039 Shabazz Napier 0.9 2.1 0.2 0.023 Shai Gilgeous-Alexander 8.2 0.0 0.0 -0.005 Shake Milton 5.0 1.7 -0.1 0.093 Shaquille Harrison -1.6 -0.8 0.0 0.035 Sindarius Thornwell 7.0 1.7 -0.1 0.198 Skal Labissière 2.8 0.8 1.6 0.020 Solomon Hill 1.2 0.5 -0.3 0.014 Spencer Dinwiddie 3.8 2.2 0.1 -0.027 Stanley Johnson -4.5 -0.5 -0.2 -0.016 Stephen Curry -6.5 1.4 0.1 -0.070 Sterling Brown -1.3 -0.4 0.1 -0.094 Steven Adams -3.0 0.7 -1.6 -0.003 Svi Mykhailiuk 5.8 1.0 0.1 0.081 T.J. Leaf -0.9 -0.1 0.1 -0.122 T.J. McConnell 0.1 1.6 0.1 -0.009 T.J. Warren 1.8 0.0 0.3 0.050 Taj Gibson -4.7 -0.4 -0.7 0.018 Taurean Prince -1.4 -0.3 0.4 -0.065 Terrance Ferguson -3.0 -0.1 0.0 -0.074 Terrence Ross -0.4 -0.5 -0.1 -0.025 Terry Rozier 9.0 1.2 0.4 0.036 Thabo Sefolosha -1.6 0.1 0.3 -0.070 Thaddeus Young -2.3 -0.7 -0.9 -0.079 Theo Pinson -0.9 0.5 0.1 -0.052 Thomas Bryant 2.7 0.5 0.5 -0.035 Thon Maker -0.3 0.0 0.3 0.075 Tim Frazier -1.7 -0.8 -0.4 -0.082 Tim Hardaway Jr. -2.3 -0.5 -0.1 0.041 Timothé Luwawu-Cabarrot 3.2 0.1 0.4 0.059 Tobias Harris -0.4 0.4 0.2 -0.016 Tomáš Satoranský 1.0 0.4 0.2 -0.055 Tony Bradley -0.8 0.1 -1.1 0.167 Tony Snell 2.0 1.3 -0.2 -0.007 Torrey Craig -0.3 -0.2 -0.1 0.019 Trae Young 10.5 1.2 -0.3 0.019 Treveon Graham -0.9 -0.2 0.2 0.025 Trevor Ariza -4.5 -2.0 -0.1 0.039 Trey Burke -3.5 -0.2 0.0 0.019 Trey Lyles -2.1 -0.3 0.4 0.028 Tristan Thompson 1.1 0.1 0.0 -0.017 Troy Brown Jr. 5.6 1.1 0.4 0.024 Troy Daniels -1.9 -0.1 0.0 -0.024 Tyler Johnson -3.9 -1.0 -0.1 -0.025 Tyler Zeller -6.7 -0.7 -0.3 -0.283 Tyrone Wallace -0.6 0.2 -0.1 -0.106 Tyson Chandler -1.8 -0.5 -0.7 0.162 Tyus Jones 0.5 -0.4 -0.2 0.044 Udonis Haslem 0.5 0.1 0.0 0.031 Victor Oladipo -4.3 -2.3 -0.1 -0.029 Vince Carter* -2.4 -0.3 -0.1 -0.067 Wayne Ellington -5.2 -0.2 -0.2 -0.052 Wendell Carter Jr. 1.0 -0.6 1.2 0.049 Wes Iwundu 0.8 0.1 0.0 0.004 Wesley Matthews -4.8 -0.9 -0.2 -0.004 Will Barton 3.6 0.8 0.6 0.048 Willie Cauley-Stein -4.7 -1.1 -0.6 0.023 Willy Hernangómez -1.2 -0.1 -0.6 0.013 Wilson Chandler -0.1 -0.5 -0.6 -0.014 Yogi Ferrell -1.5 -0.5 -0.1 -0.015 Yuta Watanabe -0.6 -0.2 0.1 0.147 Zach Collins 0.4 0.6 0.9 -0.002 Zach LaVine 1.8 -0.3 0.1 -0.017 Zhaire Smith -5.6 -1.4 -0.5 -0.139
Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of Assists Per Game (APG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the assist difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in Assists per game is highlighted in green and the most improved player is highlighted in red.
def get_assists_diff(offense, mip, year):
offense.dropna(inplace=True)
# Sort the DataFrame by the difference in PPG
sorted_growth_df = offense.sort_values(by='assists', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['assists'], color='skyblue')
plt.title(f'Difference in Assists Per Game (APG) in {year} from last season')
plt.xlabel('Difference in APG')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'assists']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest APG growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in offense.index:
award_player_difference = offense.loc[mip, 'assists']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
oo =0
for offense in allOffense:
mip = pastMipWinners[oo]
get_assists_diff(offense, mip, years[oo])
oo+=1
Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of offensive rebounds per game (OR), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in offensive rebounds per game is highlighted in green and the most improved player is highlighted in red.
def get_or_diff(offense, mip, year):
offense.dropna(inplace=True)
# Sort the DataFrame by the difference in PPG
sorted_growth_df = offense.sort_values(by='offensive rebounds', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['offensive rebounds'], color='skyblue')
plt.title(f'Difference in Offensive rebounds Per Game (OR) in {year} from last season')
plt.xlabel('Difference in OR')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'offensive rebounds']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest Offensive Rebounds growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in offense.index:
award_player_difference = offense.loc[mip, 'offensive rebounds']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
yc =0
for offense in allOffense:
mip = pastMipWinners[yc]
get_or_diff(offense, mip, years[yc])
yc+=1
Next, we are are going to create graphs that are going to analyze the growth or decline in offensive performance, specifically in terms of field goal percentage (FGP), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating their field goal percentage, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in field goal percentage is highlighted in green and the most improved player is highlighted in red.
def get_fgp_diff(offense, mip, year):
offense.dropna(inplace=True)
# Sort the DataFrame by the difference in PPG
sorted_growth_df = offense.sort_values(by='fg percentage', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['fg percentage'], color='skyblue')
plt.title(f'Difference in Field Goal Percentage (FPG) Per Game in {year} From Last Season')
plt.xlabel('Difference in FGP')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'fg percentage']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest FGP growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in offense.index:
award_player_difference = offense.loc[mip, 'fg percentage']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, offense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
yc =0
for offense in allOffense:
mip = pastMipWinners[yc]
get_fgp_diff(offense, mip, years[yc])
yc+=1
This next section will conduct a regression analysis on the player rankings and their statistics for the year. We are going to analyze the relationship between offensive player statistics and player rankings across different basketball seasons, specifically focusing on predicting player rankings based on their offensive performance and evaluating the accuracy of these predictions. We are using the offensive rankings and stats that we have used/found above including points per game, assists per game, offensive rebounds per game, and field goal percentage.
# this obtains the coefficients would show the probability of the correlation between the actual and predicted rank based on difference of each statistic
def regression2(pureOffensive, mipdata,year):
pureOffensive = pd.merge(pureOffensive, mipdata, left_index=True, right_index=True)
pureOffensive['Rank'] = pureOffensive['Rank'].astype(int)
X = pureOffensive[['ppg', 'assists', 'offensive rebounds', 'fg percentage']]
y = pureOffensive['Rank']
X = sm.add_constant(X)
X1_train, X1_test, y1_train, y1_test = train_test_split(X, y, test_size=0.6, random_state=42)
# Train the model
model1 = sm.OLS(y1_train, X1_train).fit()
print(model1.summary())
# Make predictions
y1_pred = model1.predict(X1_test)
y1_pred = np.maximum(y1_pred,1)
# Calculate evaluation metrics
mse1 = mean_squared_error(y1_test, y1_pred)
r21 = r2_score(y1_test, y1_pred)
print(f'Mean Squared Error: {mse1}')
print(f'R-squared: {r21}')
# Plot the actual vs predicted values
plt.figure(figsize=(10, 6))
plt.scatter(y1_test, y1_pred, color='skyblue')
plt.plot([min(y1_test), max(y1_test)], [min(y1_test), max(y1_test)], color='red', linewidth=2)
plt.title(f'Actual vs Predicted Ranks {year} Season')
plt.xlabel('Actual Rank')
plt.ylabel('Predicted Rank')
plt.show()
listMip = [mipdata1920,mipdata2021,mipdata2122,mipdata2223,mipdata2324]
for i,j in enumerate(allOffense):
regression2(j, listMip[i], years[i])
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.979
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 11.78
Date: Fri, 17 May 2024 Prob (F-statistic): 0.215
Time: 16:30:55 Log-Likelihood: -4.1250
No. Observations: 6 AIC: 18.25
Df Residuals: 1 BIC: 17.21
Df Model: 4
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 20.1963 3.308 6.105 0.103 -21.835 62.227
ppg -1.6348 0.478 -3.420 0.181 -7.708 4.439
assists -0.3771 0.385 -0.979 0.507 -5.273 4.519
offensive rebounds 7.4167 3.332 2.226 0.269 -34.926 49.759
fg percentage 133.5644 25.261 5.287 0.119 -187.411 454.540
==============================================================================
Omnibus: nan Durbin-Watson: 1.589
Prob(Omnibus): nan Jarque-Bera (JB): 0.541
Skew: 0.651 Prob(JB): 0.763
Kurtosis: 2.315 Cond. No. 480.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 60.39356303772042
R-squared: -2.524408217619131
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 6 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.638
Model: OLS Adj. R-squared: 0.348
Method: Least Squares F-statistic: 2.200
Date: Fri, 17 May 2024 Prob (F-statistic): 0.205
Time: 16:30:55 Log-Likelihood: -28.556
No. Observations: 10 AIC: 67.11
Df Residuals: 5 BIC: 68.63
Df Model: 4
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 19.9431 5.654 3.527 0.017 5.408 34.478
ppg -1.9155 0.832 -2.302 0.070 -4.054 0.223
assists 0.4932 2.598 0.190 0.857 -6.185 7.172
offensive rebounds 5.8328 6.788 0.859 0.429 -11.617 23.283
fg percentage 15.3430 93.296 0.164 0.876 -224.482 255.167
==============================================================================
Omnibus: 0.225 Durbin-Watson: 2.319
Prob(Omnibus): 0.894 Jarque-Bera (JB): 0.388
Skew: 0.186 Prob(JB): 0.824
Kurtosis: 2.110 Cond. No. 289.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 39.694532862511245
R-squared: -0.4419228114409153
/opt/homebrew/lib/python3.11/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 k, _ = kurtosistest(a, axis)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 4 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
return np.dot(wresid, wresid) / self.df_resid
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Fri, 17 May 2024 Prob (F-statistic): nan
Time: 16:30:55 Log-Likelihood: 120.86
No. Observations: 4 AIC: -233.7
Df Residuals: 0 BIC: -236.2
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 10.4852 inf 0 nan nan nan
ppg -0.2691 inf -0 nan nan nan
assists -1.4575 inf -0 nan nan nan
offensive rebounds -16.2889 inf -0 nan nan nan
fg percentage -2.8961 inf -0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 1.017
Prob(Omnibus): nan Jarque-Bera (JB): 0.637
Skew: 0.035 Prob(JB): 0.727
Kurtosis: 1.046 Cond. No. 70.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
Mean Squared Error: 28.22721138842799
R-squared: -1.5807736126562735
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
return np.dot(wresid, wresid) / self.df_resid
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Fri, 17 May 2024 Prob (F-statistic): nan
Time: 16:30:55 Log-Likelihood: 158.82
No. Observations: 5 AIC: -307.6
Df Residuals: 0 BIC: -309.6
Df Model: 4
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 8.7721 inf 0 nan nan nan
ppg -1.2995 inf -0 nan nan nan
assists 3.4118 inf 0 nan nan nan
offensive rebounds 1.0407 inf 0 nan nan nan
fg percentage 58.7885 inf 0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 0.429
Prob(Omnibus): nan Jarque-Bera (JB): 0.524
Skew: 0.481 Prob(JB): 0.770
Kurtosis: 1.739 Cond. No. 380.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 11.009424092482377
R-squared: -1.2087872787425455
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
return np.dot(wresid, wresid) / self.df_resid
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Fri, 17 May 2024 Prob (F-statistic): nan
Time: 16:30:55 Log-Likelihood: 160.01
No. Observations: 5 AIC: -310.0
Df Residuals: 0 BIC: -312.0
Df Model: 4
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 10.9733 inf 0 nan nan nan
ppg -3.7525 inf -0 nan nan nan
assists 10.6587 inf 0 nan nan nan
offensive rebounds 3.7924 inf 0 nan nan nan
fg percentage 67.0659 inf 0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 1.038
Prob(Omnibus): nan Jarque-Bera (JB): 0.721
Skew: -0.492 Prob(JB): 0.697
Kurtosis: 1.422 Cond. No. 313.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 69.60874222280657
R-squared: -3.097607645383235
While analyzing the regressions of the the offensive stats it is important to note that due to small sample size within a given season with varying ranks and numbers of players on that years Most Improved Player ladder, regression may not have an ample amount of samples to do an effective regression model. For the 19/20 Most Improved player table, it indicated a p-value of 0.215 meaning that the model was not statistically significant. While offensive rebounds was the only parameter with a positive coefficient suggesting it would indicate a higher rank, all of the parameters were statistically insignificant. For the 20/21 season, the points per game negative correlation with rank was nearly statistically significant with a p-value of 0.070 as 0.05 or below is needed for signicance.
Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2019-2020 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMOffensive1820(pureOffensive1820,mipdata1920):
pureOffensive1820 = pd.merge(pureOffensive1820, mipdata1920, left_index=True, right_index=True)
correlation_matrix = pureOffensive1820.corr()
# Plot the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2018-2020 Seasons for MIP')
plt.show()
corrMOffensive1820(pureOffensive1820,mipdata1920)
# direct positive correlations are redder and negative correlations are bluer
Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2020-2021 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMOffensive1921(pureOffensive1921,mipdata2021):
pureOffensive1921 = pd.merge(pureOffensive1921, mipdata2021, left_index=True, right_index=True)
correlation_matrix1 = pureOffensive1921.corr()
# Plot the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix1, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2019-2021 Seasons for MIP')
plt.show()
corrMOffensive1921(pureOffensive1921,mipdata2021)
Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2021-2022 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMOffensive2022(pureOffensive2022,mipdata2122):
pureOffensive2022 = pd.merge(pureOffensive2022, mipdata2122, left_index=True, right_index=True)
correlation_matrix2 = pureOffensive2022.corr()
# Plot the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix2, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2020-2022 Seasons for MIP')
plt.show()
corrMOffensive2022(pureOffensive2022,mipdata2122)
Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2022-2023 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMOffensive2123(pureOffensive2123,mipdata2223):
pureOffensive2123 = pd.merge(pureOffensive2123, mipdata2223, left_index=True, right_index=True)
correlation_matrix3 = pureOffensive2123.corr()
# Plot the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix3, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2021-2023 Seasons for MIP')
plt.show()
corrMOffensive2123(pureOffensive2123,mipdata2223)
Here, we are trying to to visualize the correlations between various offensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2023-2024 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMOffensive2224(pureOffensive2224,mipdata2324):
pureOffensive2224 = pd.merge(pureOffensive2224, mipdata2324, left_index=True, right_index=True)
correlation_matrix4 = pureOffensive2224.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2022-2024 Seasons for MIP')
plt.show()
corrMOffensive2224(pureOffensive2224,mipdata2324)
Defensive stats
Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Blocks Per Game (BPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in blocks, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in blocks per game is highlighted in green and the most improved player is highlighted in red.
# Store the blocks, steals and defensive rebounds in a data frame and remove null string rows from data frame to get difference in all active players
pureDefensive1819 = data1819[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive1920 = data1920[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2021 = data2021[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2122 = data2122[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2223 = data2223[['blocks','steals','defensive rebounds']].replace('', np.nan)
pureDefensive2324 = data2324[['blocks','steals','defensive rebounds']].replace('', np.nan)
# # Convert each of the stats to floats for arithmetics subtraction over seasons
pureDefensive1819 = pureDefensive1819.astype(float) # 1920 season
pureDefensive1920 = pureDefensive1920.astype(float)
pureDefensive2021 = pureDefensive2021.astype(float)
pureDefensive2122 = pureDefensive2122.astype(float)
pureDefensive2223 = pureDefensive2223.astype(float)
pureDefensive2324 = pureDefensive2324.astype(float)
# Subtracts from the 19/20 season data so growth and decline is noted in the data frame
pureDefensive1820 = pureDefensive1920.sub(pureDefensive1819)
pureDefensive1921 = pureDefensive2021.sub(pureDefensive1920)
pureDefensive2022 = pureDefensive2122.sub(pureDefensive2021)
pureDefensive2123 = pureDefensive2223.sub(pureDefensive2122)
pureDefensive2224 = pureDefensive2324.sub(pureDefensive2223)
allDefense = [pureDefensive1820, pureDefensive1921, pureDefensive2022, pureDefensive2123, pureDefensive2224]
yc = 0
years = ["19-20", "20-21", "21-22", "22-23", "23-24"]
def get_blocks_diff(defense, mip, year):
defense.dropna(inplace=True)
# Sort the DataFrame by the difference in PPG
sorted_growth_df = defense.sort_values(by='blocks', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['blocks'], color='skyblue')
plt.title(f'Difference in Blocks Per Game (BPG) in {year} from last season')
plt.xlabel('Difference in BPG')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'blocks']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest BPG Growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in defense.index:
award_player_difference = defense.loc[mip, 'blocks']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
# plt.text(award_player_difference, offense.index.get_loc(mip, f'{mip}: {award_player_difference}'), va='center')
plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
count = 0
for defense in allDefense:
mip = pastMipWinners[yc]
get_blocks_diff(defense, mip, years[count])
count+=1
# print(pureOffensive1820)
Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Steals Per Game (SPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in steals, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in steals per game is highlighted in green and the most improved player is highlighted in red.
def get_steals_diff(defense, mip, year):
defense.dropna(inplace=True)
# Sort the DataFrame by the difference in steals per game
sorted_growth_df = defense.sort_values(by='steals', ascending=False)
# Plotting the difference in steals per game for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['steals'], color='skyblue')
plt.title(f'Difference in Steals Per Game (SPG) in {year} from last season')
plt.xlabel('Difference in SPG')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'steals']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest SPG Growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in defense.index:
award_player_difference = defense.loc[mip, 'steals']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
count = 0
for defense in allDefense:
mip = pastMipWinners[count]
get_steals_diff(defense, mip, years[count])
count+=1
Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Defensive Rebounds Per Game (DRPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in defensive rebounds, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in steals per game is highlighted in green and the most improved player is highlighted in red.
def get_Dr_diff(defense, mip, year):
# Any player that cannot stay in the league will have NaN values and thus will be dropped from the table
defense.dropna(inplace=True)
# Sort the DataFrame by the difference in DRPG
sorted_growth_df = defense.sort_values(by='defensive rebounds', ascending=False)
# Plotting the difference in PPG for all players
plt.figure(figsize=(10, 6))
plt.barh(sorted_growth_df.index, sorted_growth_df['defensive rebounds'], color='skyblue')
plt.title(f'Difference in Defensive Rebounds Per Game (DRPG) in {year} from last season')
plt.xlabel('Difference in DRPG')
plt.ylabel('Players')
# Highlighting the player with the highest difference
highest_difference_player = sorted_growth_df.index[0]
highest_difference = sorted_growth_df.loc[highest_difference_player, 'defensive rebounds']
plt.barh(highest_difference_player, highest_difference, color='green', label='Highest DRPG Growth Player')
# Annotating the highest difference
plt.text(highest_difference, len(sorted_growth_df) - 1, f"{highest_difference_player}: {highest_difference}", va='bottom', ha= 'right', color= 'green')
# Highlighting the award player
if mip in defense.index:
award_player_difference = defense.loc[mip, 'defensive rebounds']
plt.barh(mip, award_player_difference, color='red', label=f'MIP Winner {year} Season')
plt.text(award_player_difference, defense.index.get_loc(mip),f'{mip}: {award_player_difference}', va='bottom', ha= 'right', color = 'red')
plt.legend(loc= 'lower left')
plt.tight_layout()
plt.show()
count = 0
for defense in allDefense:
mip = pastMipWinners[count]
get_Dr_diff(defense, mip, years[count])
count+=1
Next, we are are going to create graphs that are going to analyze the growth or decline in defensive performance, specifically in terms of Defensive Rebounds Per Game (DRPG), across multiple basketball seasons, highlighting notable players and their improvements. We are going to do this by first cleaning the said data again due to the player needing to be the in the NBA for 2 seasons, then calulating the difference in defensive rebounds, then visualize the data by graphing it. Any player not in the NBA for 2 seasons will be dropped from said table. The player with the highest growth in defensive rebounds per game is highlighted in green and the most improved player is highlighted in red.
This next section will conduct a regression analysis on the player rankings and their statistics for the year. We going to analyze the relationship between defensive player statistics and player rankings across different basketball seasons, specifically focusing on predicting player rankings based on their defensive performance and evaluating the accuracy of these predictions. We are using the defensive rankings and stats that we have used/found above including blocks per game, steals per game, and defensive rebounds per game.
def regression3(pureDefensive, mipdata,year):
pureDefensive = pd.merge(pureDefensive, mipdata, left_index=True, right_index=True)
pureDefensive['Rank'] = pureDefensive['Rank'].astype(int)
X = pureDefensive[['blocks', 'steals', 'defensive rebounds']]
y = pureDefensive['Rank']
X = sm.add_constant(X)
X1_train, X1_test, y1_train, y1_test = train_test_split(X, y, test_size=0.6, random_state=42)
# Train the model
model1 = sm.OLS(y1_train, X1_train).fit()
print(model1.summary())
# Make predictions
y1_pred = model1.predict(X1_test)
y1_pred = np.maximum(y1_pred,1)
# Calculate evaluation metrics
mse1 = mean_squared_error(y1_test, y1_pred)
r21 = r2_score(y1_test, y1_pred)
print(f'Mean Squared Error: {mse1}')
print(f'R-squared: {r21}')
# Plot the actual vs predicted values
plt.figure(figsize=(10, 6))
plt.scatter(y1_test, y1_pred, color='skyblue')
plt.plot([min(y1_test), max(y1_test)], [min(y1_test), max(y1_test)], color='red', linewidth=2)
plt.title(f'Actual vs Predicted Ranks {year} Season')
plt.xlabel('Actual Rank')
plt.ylabel('Predicted Rank')
plt.show()
for i,j in enumerate(allDefense):
regression3(j, listMip[i], years[i])
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.326
Model: OLS Adj. R-squared: -0.684
Method: Least Squares F-statistic: 0.3229
Date: Fri, 17 May 2024 Prob (F-statistic): 0.814
Time: 16:41:30 Log-Likelihood: -14.560
No. Observations: 6 AIC: 37.12
Df Residuals: 2 BIC: 36.29
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 16.9049 8.136 2.078 0.173 -18.103 51.913
blocks 1.3807 18.161 0.076 0.946 -76.761 79.522
steals -4.6993 7.919 -0.593 0.613 -38.773 29.374
defensive rebounds -3.3252 3.910 -0.850 0.485 -20.149 13.499
==============================================================================
Omnibus: nan Durbin-Watson: 1.396
Prob(Omnibus): nan Jarque-Bera (JB): 0.260
Skew: -0.402 Prob(JB): 0.878
Kurtosis: 2.374 Cond. No. 19.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 43.6148118407519
R-squared: -1.5452447832139073
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 6 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.343
Model: OLS Adj. R-squared: 0.015
Method: Least Squares F-statistic: 1.045
Date: Fri, 17 May 2024 Prob (F-statistic): 0.438
Time: 16:41:30 Log-Likelihood: -31.531
No. Observations: 10 AIC: 71.06
Df Residuals: 6 BIC: 72.27
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 14.2193 3.388 4.197 0.006 5.928 22.510
blocks -4.9871 15.717 -0.317 0.762 -43.444 33.470
steals 2.7433 6.080 0.451 0.668 -12.134 17.620
defensive rebounds -4.3028 2.800 -1.537 0.175 -11.153 2.548
==============================================================================
Omnibus: 0.088 Durbin-Watson: 1.907
Prob(Omnibus): 0.957 Jarque-Bera (JB): 0.313
Skew: -0.016 Prob(JB): 0.855
Kurtosis: 2.134 Cond. No. 9.76
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 19.123526843848808
R-squared: 0.3053287794856343
/opt/homebrew/lib/python3.11/site-packages/scipy/stats/_stats_py.py:1971: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 k, _ = kurtosistest(a, axis)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 4 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/opt/homebrew/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
return np.dot(wresid, wresid) / self.df_resid
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Fri, 17 May 2024 Prob (F-statistic): nan
Time: 16:41:30 Log-Likelihood: 116.33
No. Observations: 4 AIC: -224.7
Df Residuals: 0 BIC: -227.1
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 210.0000 inf 0 nan nan nan
blocks -260.0000 inf -0 nan nan nan
steals -25.0000 inf -0 nan nan nan
defensive rebounds -115.0000 inf -0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 0.157
Prob(Omnibus): nan Jarque-Bera (JB): 0.485
Skew: 0.682 Prob(JB): 0.785
Kurtosis: 1.976 Cond. No. 345.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 9218.062499999956
R-squared: -841.7942857142817
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.869
Model: OLS Adj. R-squared: 0.476
Method: Least Squares F-statistic: 2.213
Date: Fri, 17 May 2024 Prob (F-statistic): 0.450
Time: 16:41:30 Log-Likelihood: -9.4803
No. Observations: 5 AIC: 26.96
Df Residuals: 1 BIC: 25.40
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 3.9383 2.129 1.850 0.316 -23.117 30.993
blocks 16.3273 7.800 2.093 0.284 -82.785 115.440
steals -25.7250 11.643 -2.209 0.271 -173.668 122.218
defensive rebounds -3.4665 1.833 -1.891 0.310 -26.761 19.828
==============================================================================
Omnibus: nan Durbin-Watson: 2.189
Prob(Omnibus): nan Jarque-Bera (JB): 1.000
Skew: -1.081 Prob(JB): 0.607
Kurtosis: 2.652 Cond. No. 9.18
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 33.52740546056749
R-squared: -5.7265014090166755
/opt/homebrew/lib/python3.11/site-packages/statsmodels/stats/stattools.py:74: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i "
OLS Regression Results
==============================================================================
Dep. Variable: Rank R-squared: 0.925
Model: OLS Adj. R-squared: 0.701
Method: Least Squares F-statistic: 4.125
Date: Fri, 17 May 2024 Prob (F-statistic): 0.344
Time: 16:41:30 Log-Likelihood: -6.4269
No. Observations: 5 AIC: 20.85
Df Residuals: 1 BIC: 19.29
Df Model: 3
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
const 6.0057 2.466 2.435 0.248 -25.332 37.344
blocks 18.0569 7.869 2.295 0.262 -81.927 118.041
steals 8.7817 4.485 1.958 0.301 -48.200 65.764
defensive rebounds -3.6614 2.370 -1.545 0.366 -33.774 26.451
==============================================================================
Omnibus: nan Durbin-Watson: 0.905
Prob(Omnibus): nan Jarque-Bera (JB): 0.921
Skew: 1.035 Prob(JB): 0.631
Kurtosis: 2.631 Cond. No. 12.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Mean Squared Error: 27.847719803603102
R-squared: -0.6392916454155892
In the case of running regressions on the defensive stats throughout the years, from interpreting the F-stats, p-values, along with the coefficients, it has been found that the correlations among the different criteria are statistically insignificant among all the years of the defensive years data.
Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2019-2020 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMDefensive1820(pureDefensive1820,mipdata1920):
pureDefensive1820 = pd.merge(pureDefensive1820, mipdata1920, left_index=True, right_index=True)
correlation_matrix4 = pureDefensive1820.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2018-2020 Seasons for MIP')
plt.show()
corrMDefensive1820(pureDefensive1820,mipdata1920)
Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2020-2021 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMDefensive1921(pureDefensive1921,mipdata2021):
pureDefensive1921 = pd.merge(pureDefensive1921, mipdata2021, left_index=True, right_index=True)
correlation_matrix4 = pureDefensive1921.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2019-2021 Seasons for MIP')
plt.show()
corrMDefensive1921(pureDefensive1921,mipdata2021)
Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2021-2022 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMDeffensive2022(pureDefensive2022,mipdata2122):
pureDefensive2022 = pd.merge(pureDefensive2022, mipdata2122, left_index=True, right_index=True)
correlation_matrix4 = pureDefensive2022.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2020-2022 Seasons for MIP')
plt.show()
corrMDeffensive2022(pureDefensive2022,mipdata2122)
Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2022-2023 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMDefensive2123(pureDefensive2123,mipdata2223):
pureDefensive2123 = pd.merge(pureDefensive2123, mipdata2223, left_index=True, right_index=True)
correlation_matrix4 = pureDefensive2123.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2021-2023 Seasons for MIP')
plt.show()
corrMDefensive2123(pureDefensive2123,mipdata2223)
Here, we are trying to to visualize the correlations between various defensive player statistics and the likelihood of a player being selected as the Most Improved Player for the 2023-2024 basketball season. We are plotting the data using a heat map. The cells in the heatmap are color-coded based on the correlation values, where positive correlations are represented in reddish tones and negative correlations in bluish tones.
def corrMDefensive2224(pureDefensive2224,mipdata2324):
pureDefensive2224 = pd.merge(pureDefensive2224, mipdata2324, left_index=True, right_index=True)
correlation_matrix4 = pureDefensive2224.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix4, annot=True, cmap='coolwarm', fmt=".2f", annot_kws={"size": 10})
plt.title('Correlation Matrix for 2022-2024 Seasons for MIP')
plt.show()
corrMDefensive2224(pureDefensive2224,mipdata2324)
CONCLUSION
Based on the sample size of the data and the results, the parameters selected were not statistically significant in determining the rank of a player to be in the running for the Most Improved Player(MIP) award. The project delves into the analysis of basketball player performance data with a specific emphasis on comparing the Most Improved Players (MIPs) over multiple NBA seasons to who could’ve won it based on statistics. Initially, the datasets containing offensive and defensive player statistics are cleaned and preprocessed to ensure data integrity. This involves extracting relevant metrics such as points, assists, rebounds, steals, and blocks, while handling missing values appropriately. Subsequently, the project computes the differences in player performance metrics between consecutive seasons to identify significant improvements or declines over time. Various regression analyses are conducted to explore the relationship between player statistics and rankings, allowing for predictions of player rankings based on their performance metrics. We employ visualization techniques such as bar charts and heatmaps to present the findings in a visually appealing and comprehensible manner. Overall, the project aims to provide valuable insights into player development trends and the factors influencing Most Improved Player (MIP) selections in the NBA. In future studies, insights could be made using all data from the inception of the nba to draw development correlations between players with regards to constraints on years stat checking began as well as constraints with processing power over large data sets.